-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Different initialization methods for LoRA #1189
Conversation
This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers. Description The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass `init_lora_weights='gaussian'` to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r. The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers). In the future, similar extensions could be added for other adapter methods. Notes For testing, a rather simple test is added which calculates the Kolmogorov-Smirnov distance between the weights of a LoRA layer and the expected distribution. If someone has a better idea for a test, please let me know.
The documentation is not available anymore as the PR was closed or merged. |
@@ -18,7 +18,9 @@ | |||
extras["quality"] = ["black ~= 22.0", "ruff>=0.0.241", "urllib3<=2.0.0"] | |||
extras["docs_specific"] = ["hf-doc-builder"] | |||
extras["dev"] = extras["quality"] + extras["docs_specific"] | |||
extras["test"] = extras["dev"] + ["pytest", "pytest-cov", "pytest-xdist", "parameterized", "datasets", "diffusers<0.21.0"] | |||
extras["test"] = extras["dev"] + [ | |||
"pytest", "pytest-cov", "pytest-xdist", "parameterized", "datasets", "diffusers<0.21.0", "scipy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why's the version restriction needed on the diffusers
side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was pinned in #936 and can probably be unpinned now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Particularly, the tests are very thorough!
How should this be reflected in huggingface/diffusers#5419?
Hmm, not sure about that PR specifically. In general, for diffusers to use this feature, every time that a
|
Hmm the changes sound good to me. @patrickvonplaten WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @BenjaminBossan for adding support for different weight initialization methods for LoRA. The tests are really good, never thought that I would see statistical significance analysis in tests. LGTM! 🤩
Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it? |
Makes sense |
Do you mean testing it on huggingface/diffusers#5388? I can do that tomorrow. |
I can confirm that this works: https://wandb.ai/sayakpaul/dreambooth-lora/runs/bub8wjc3?workspace=user-sayakpaul |
Great, thanks for checking it. |
This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers. The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass `init_lora_weights='gaussian'` to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r. The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers). In the future, similar extensions could be added for other adapter methods.
This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers.
Description
The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass
init_lora_weights='gaussian'
to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r.The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers).
In the future, similar extensions could be added for other adapter methods.
Notes
For testing, a rather simple test is added which calculates the Kolmogorov-Smirnov distance between the weights of a LoRA layer and the expected distribution. If someone has a better idea for a test, please let me know.